Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parent dimensions into calculation component table #2753

Merged
merged 5 commits into from
Jul 25, 2023

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jul 24, 2023

PR Overview

This PR does three main things:

  • converts the calculation component table to have _parent suffix for the parent fact/table columns instead of _calc suffix for all of the calculation components
  • adds the parent dimensions into the calc component table by:
    • adding total == rest of dimension value records
    • assuming non-total dimensions should have the same dimensions for parent & child.

I recommend starting from the bottom of the files changes: test -> transform -> output.

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@cmgosnell cmgosnell changed the title convert calc component table to have _parent instead of _calc and add… Add parent dimensions into calculation component table Jul 24, 2023
@codecov
Copy link

codecov bot commented Jul 25, 2023

Codecov Report

Patch coverage: 100.0% and no project coverage change.

Comparison is base (089121a) 88.4% compared to head (7632b6e) 88.4%.

❗ Current head 7632b6e differs from pull request most recent head 68b60cb. Consider uploading reports for the commit 68b60cb to get more accurate results

Additional details and impacted files
@@              Coverage Diff              @@
##           explode_ferc1   #2753   +/-   ##
=============================================
  Coverage           88.4%   88.4%           
=============================================
  Files                 88      89    +1     
  Lines              10675   10711   +36     
=============================================
+ Hits                9440    9476   +36     
  Misses              1235    1235           
Files Changed Coverage Δ
src/pudl/extract/ferc6.py 100.0% <ø> (ø)
src/pudl/metadata/sources.py 100.0% <ø> (ø)
src/pudl/output/eia923.py 97.7% <ø> (ø)
src/pudl/output/sql/helpers.py 100.0% <ø> (ø)
src/pudl/extract/ferc.py 100.0% <100.0%> (ø)
src/pudl/extract/ferc60.py 100.0% <100.0%> (ø)
src/pudl/output/ferc1.py 88.6% <100.0%> (+<0.1%) ⬆️
src/pudl/settings.py 98.8% <100.0%> (+<0.1%) ⬆️
src/pudl/transform/ferc1.py 97.0% <100.0%> (+<0.1%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can collaborate on some docstrings on the phone? You explain and I try to translate?

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
Comment on lines 6787 to 6788
Then, we assume that every parent factoid should have the same dimensions as its
calculation component dimensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean that every parent factoid should have the same dimension values as its calculation components? E.g. if a parent has utility_type=="electric" then all of its components are also electric? Not that if a parent has a utility_type dimension then the child should also have a utility_type dimension.

Aren't parent nodes with the value total in any dimension an exception to this?

Maybe we mean "With the exception of parents having a value of total in some dimension, we assume that all parents should have the same dimension values as the calculation components that they are composed of."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a description of what we expect to be true of the dataframe that's reported at the end of this process? Or maybe you can explain it to me on the phone and I could write it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added some context of what these tables looked like coming into this function as well as what actually happens/what is expected on the way out. very happy to chat on the phone about it. i also recommend going to looking at the unit test.

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
Comment on lines 6810 to 6815
calc_comp_idx = [
"table_name_parent",
"xbrl_factoid_parent",
"table_name",
"xbrl_factoid",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the PKs necessarily not include the dimension columns? E.g. couldn't there be cases where we define different calculations depending utility_type? This almost happens where we have one calculation with lots of details from the plant_in_service_ferc1 table, and a different simpler calculation for the gas and other utility types. In that case I think it happens to be the case that the calculation components are being reported in different tables, but does that necessarily have to be true?

Or is it the case that at this point in the processing of the calculation components table we're still adding values to the dimension columns, and at the end of this step they're going to be part of the PKs in a way that they weren't before? That seems true with respect to the addition of the total calculations, but is it true more generally?

@zaneselvans zaneselvans added ferc1 Anything having to do with FERC Form 1 metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. xbrl Related to the FERC XBRL transition labels Jul 25, 2023
@cmgosnell cmgosnell merged commit 1a6efa4 into explode_ferc1 Jul 25, 2023
4 checks passed
@cmgosnell cmgosnell deleted the explode_add_parent_dims branch July 25, 2023 19:15
@cmgosnell cmgosnell mentioned this pull request Jul 27, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc1 Anything having to do with FERC Form 1 metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. xbrl Related to the FERC XBRL transition
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants